A method for assessing risk of disclosure in census microdata and tabular data
نویسنده
چکیده
The area of con dentiality in personal databases is one which is currently of considerable interest. This paper concentrates on current research relating to measuring the risk of disclosure of information about individuals in census data, which the authors believe is directly applicable to other data sources, such as employer-employee databases. Various methods have been used in the past by census o ces in different countries to ensure that requirements of con dentiality are maintained. These include both legal safeguards | typically the user of census data has to sign a binding agreement that they will not misuse the data | and the use of various mechanisms to anonymise and modify data prior to release. The most common methods used are data aggregation, data perturbation and data suppression. There has been little quantitative research to investigate the extent to which any of these methods achieve their stated aim or the extent to which they change the data. The measurement of con dentiality risks is crucial to the comparison of di erent protection methods. Quantitative measures of risk of disclosure would allow researchers to comment critically on the various protection methods associated with current practice, and based on those assessments to give advice about the degree to which data should be modi ed in order to ensure con dentiality. This paper describes research carried out using a new method developed to assess the risk of disclosure in both microdata and tabular data. The work is related to the design of output areas for the reporting of census results. This requires a measure of risk which can be assessed for speci c contexts (such as local areas) rather than for an entire dataset. This localisation of risk assessment allows datasets to be produced in which there is an e cient balance between security and utility of the data. Where risk of disclosure is found to be unacceptable, a proposed area can be rejected, and by a process of redesign and reassessment the area can be modi ed until risk has been reduced to an acceptable level, however where the risk of disclosure is found to be low, the proposed area can be accepted. The alternative | of assessing risk for an entire dataset globally | will generally lead to a situation where areas of high risk are protected at the
منابع مشابه
Statistical Disclosure Control Methods for Census Frequency Tables
This paper provides a review of common statistical disclosure control (SDC) methods implemented at Statistical Agencies for standard tabular outputs containing whole population counts from a Census (either enumerated or based on a register). These methods include record swapping on the microdata prior to its tabulation and rounding of entries in the tables after they are produced. The approach ...
متن کاملStatistical Disclosure Control: New Directions and Challenges
Traditionally, statistical agencies generally release outputs in the form of microdata and tabular data. Microdata contain data from social surveys and tabular data contain either frequency counts, such as for census dissemination, or magnitude data typically arising from business surveys, eg. total revenue. For each of these traditional outputs, there has been much research on how to quantify ...
متن کاملSoftware for tabular data protection.
In order for national statistical offices to maintain the trust of the public to collect data and publish statistics of importance to society and decision-making, it is imperative that respondents (persons or establishments) be guaranteed privacy and confidentiality in return for providing requested confidential data. Consequently, for most survey and census data, disclosure limitation techniqu...
متن کاملAssessing the Protection Provided by Misclassification-based Disclosure Limitation Methods for Survey Microdata
Government statistical agencies often apply statistical disclosure limitation techniques to survey microdata to protect the confidentiality of respondents. There is a need for valid and practical ways to assess the protection provided. This paper develops some simple methods for disclosure limitation techniques which perturb the values of categorical identifying variables. The methods are appli...
متن کاملUsing Noise for Disclosure Limitation of Establishment Tabular Data
We propose a new disclosure limitation method for establishment magnitude tabular data in which noise is added to the underlying microdata prior to tabulation. The proposed method has several advantages compared to the standard method of cell suppression: it enables some information to be provided within more cells of the table, it eliminates the need to coordinate cell suppression patterns bet...
متن کامل